Skip to content

Add support for autodetection of gres resources #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 21, 2025

Conversation

jovial
Copy link
Contributor

@jovial jovial commented Apr 23, 2025

Adds support for setting the AutoDetection property on gres resources. This prevents the need to manually specify File in the gres dictionary. You can only use one auto-detection mechanism per node, otherwise slurm will complain - hence why it is a per-nodegroup option and not a per-gres option.

Example:

# group_vars/all/openhpc.yml

openhpc_nodegroups:
    - name: cpu
    - name: gpu
      gres_autodetect: nvml
      gres:
        - conf: "gpu:nvidia_h100_80gb_hbm3:2"
        - conf: "gpu:nvidia_h100_80gb_hbm3_4g.40gb:2"
        - conf: "gpu:nvidia_h100_80gb_hbm3_1g.10gb:6"

NB: autodetection requires rebuild of the OpenHPC packages - this is not provided by this role

@jovial jovial requested a review from a team as a code owner April 23, 2025 17:07
@jovial jovial marked this pull request as draft April 23, 2025 20:20
@jovial jovial marked this pull request as ready for review April 24, 2025 09:04
Copy link
Collaborator

@sjpb sjpb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some concerns

@jovial jovial requested a review from sjpb April 28, 2025 08:39
@jovial jovial marked this pull request as draft May 8, 2025 14:18
@jovial jovial changed the base branch from master to feat/nodegroups May 8, 2025 14:27
@jovial jovial marked this pull request as ready for review May 8, 2025 15:59
@jovial
Copy link
Contributor Author

jovial commented May 8, 2025

Ready for review but merge #183 first (this PR targets that branch to avoid noise in diff)

Base automatically changed from feat/nodegroups to master May 13, 2025 08:19
@jovial jovial changed the base branch from master to feat/nodegroups-v2 May 16, 2025 12:48
@jovial jovial force-pushed the feature/gres-autodetect branch 2 times, most recently from e3f58ad to 1ca4a4e Compare May 16, 2025 13:24
Copy link
Collaborator

@sjpb sjpb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, but looks pretty good to me.

Base automatically changed from feat/nodegroups-v2 to master May 16, 2025 14:01
@jovial jovial force-pushed the feature/gres-autodetect branch 3 times, most recently from 4ed9a81 to e8c09aa Compare May 20, 2025 12:08
@jovial jovial force-pushed the feature/gres-autodetect branch from e8c09aa to facef75 Compare May 21, 2025 09:44
@jovial jovial requested a review from sjpb May 21, 2025 09:49
Copy link
Collaborator

@sjpb sjpb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

@sjpb sjpb merged commit 3354f7f into master May 21, 2025
29 checks passed
@sjpb sjpb deleted the feature/gres-autodetect branch May 21, 2025 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants